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Preface 


This  paper  describes  how  regression  diagnostics  were  used  to  help 
develop  revised  cost-estimating  relationships  for  jet  engines.  The 
goal  was  to  derive  meaningful,  yet  easy  to  use  models  based  on  an 
updated  collection  of  few  observations  and  many  variables.  First, 
specific  criteria  were  established  for  selecting  explanatory 
variables.  A  variety  of  numerical  and  graphical  techniques  were  then 
used  to  critique  candidate  models  by  examining  residuals  and 
evaluating  the  influence  of  individual  engines.  The  final  models  are 
not  only  intuitively  satisfying,  but  generally  provide  better 
predictions  and  are  easier  to  use  than  earlier  models.  Additionally, 
the  user  is  provided  with  a  greater  understanding  of  the  design  and 
sensitivity  of  the  models,  and  therefore  a  better  understanding  of  the 
actual  estimates. 

This  research  was  undertaken  as  part  of  the  "Air  Force  Resource 
Financial  Management  Issues  for  the  1980s"  project  of  Rand's  Project 


AIR  FORCE. 


Introduction 


The  propulsion  system  of  most  new  military  aircraft  is  a  turbine 
engine.  These  complex  engines  can  cost  over  a  billion  dollars  for 
development  and  product  improvement.  In  fact,  the  propulsion  system 
accounts  for  as  much  as  25°0  of  the  flyaway  cost  of  a  fighter  aircraft. 
Consequently  Rand  has,  over  the  years,  developed  mathematical  models 
for  use  in  the  early  planning  stages  of  engines,  which  provided 
estimates  of  development  and  production  cost.  One  of  the  most  popular 
sets  of  these  cost-estimating  relationships  (CERs)  have  been 
incorporated  into  a  Rand  developed  computer  model  of  aircraft  system 
costs  known  as  DAPCA  (Development  and  Procurement  Cost  of  Aircraft)  [1]. 

The  DAPCA  model  estimates  engine  costs  using  the  results  from  two 
regression  equations.  Experience  has  shown  the  estimates  to  be  quite 
sensitive  to  small  changes  in  the  data  used  to  drive  the  equations,  and 
likely  to  underestimate  costs  for  the  latest  high  performance  engines. 

A  revised  set  of  CERs  has  been  developed  by  Birkler,  Garfinkle,  and 
Marks  [2]  which  not  only  incorporates  data  from  engines  developed  since 
the  DAPCA  equations  were  derived,  but  also  takes  advantage  of  recently 
available  statistical  and  computational  tools. 

This  paper  focuses  on  one  of  these  new  engine  equations  as  a  way 
of  describing  our  experiences  with  these  tools. 

A  Typical  CER 

The  Model  Qualifications  Test  (MQT)  is  a  series  of  tests  used  to 
demonstrate  an  engine's  suitability  for  production.  It  is  a  major 
milestone  in  the  development  of  aircraft  turbine  engines,  marking  the 
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transition  between  development  and  production.  For  conceptual  planning 
studies,  preliminary  tradeoff  analyses,  etc.,  it  is  desirable  to  have 
an  easy  to  use  procedure  for  estimating  within  +  25°0  the  cost  of 
a  proposed  engine  through  the  successful  completion  of  MQT.  An 
equation  that  produces  this  estimate  should  require  few  input 
parameters,  all  of  which  should  be  available  during  the  concept 
formulation  stage  --  before  blueprints,  orders  for  material,  and  other 
engine  components  can  be  specified.  Also,  the  predictive  capabilities 
of  the  model  should  be  well  understood.  Finally,  the  signs  of  the 
coefficients  should  be  consistent  with  intuitive  notions  of  the 
relationship  between  technology  and  cost.  A  CER  that  has  these 
features  will  not  only  have  a  variety  of  applications  but  will  better 
stand  the  test  of  time. 

Candidate  Explanatory  Variables 

Our  search  for  suitable  explanatory  variables  began  with 
the  hypothesis  that  the  cost  of  an  engine  is  a  function  of  (1)  the 
size  of  the  engine,  (2)  the  level  of  technology/performance 
incorporated  into  the  engine,  and  (3)  the  time  during  which  the  engine 
is  produced  [3J.  We  also  required  that  a  candidate  variable  be 
logically  related  to  cost  and  be  available  (with  a  reasonable  amount  of 
accuracy)  early  in  the  planning  cycle.  Our  final  set  of  candidate 
explanatory  variables  that  satisfied  these  criteria  is  shown  in  Table  1. 
Our  dependent  variable  is  the  cost  of  development  up  to  successful 
completion  of  MQT  (MQTDEVCOST) . 
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Table  1 

CANDIDATE  EXPLANATORY  VARIABLES 


Size 

Performance/Technology 

Time 

* 

Thrust 

* 

Turbine  inlet  temperature 

* 

Time  of 

Arrival 

Weight 

Thrust  to  weight  ratio 

* 

Airflow 

Mach  number 

Total  pressure 

Specific  fuel  consumption 

Thrust  per  pound  of  airflow 

These  variables  say  be  easier  to  obtain  in  a 
long  range  planning  study  than  others. 
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Cost  Data 

Data  on  sixteen  turbojet  and  turbofan  engines  were  collected 
from  those  companies  who  developed  and  produced  the  various  aircraft 
turbine  engines.  The  data  were  adjusted  so  that  comparisons  could  be 
made  among  engines  based  on  constant  dollars,  like  quantities,  and 
for  generally  similar  developmental  strategies. 

Transformations  of  the  Data 

The  specific  form  of  a  CER  (whether  it  is  linear  or  logarithmic 
in  the  independent  and  dependent  variables)  implies  various  assumptions 
about  technology  and  cost  trends  [4],  For  example,  a  model  that  is 
linear  in  both  the  independent  and  dependent  variables  (linear-linear) 
implies  a  constant  relationship  between  cost  and  technology,  whereas 
a  linear-log  model  implies  an  acceleration  in  cost.  Although  it  may 
seem  unreasonable  to  expect  either  relationship  to  hold  indefinitely 
over  time,  these  forms  could  describe  certain  phases  of  engine 
technology.  A  linear-log  model  implies  a  deceleration  or  an 
acceleration  in  cost  depending  on  the  values  of  the  coefficients  of  the 
logarithmic  terms.  The  log- log  model  implies  a  deceleratior  in  cost, 
as  might  be  expected  in  a  mature  technology.  To  help  answer  the 
question  of  which  equation  form  best  describes  the  actual  relationship 
between  engine  cost  and  engine  characteristics,  all  four  forms  were 
studied  using  a  variety  of  diagnostic  tools. 

Diagnostic  Tools 

Our  first  step  in  developing  a  CER  for  MQTDEVCOST  was  to  compute 
all  possible  linear  least-squares  regressions  [5]  using  from  one  to 


five  of  the  candidate  independent  variables.  Advances  in  computer 
algorithms  helped  to  hold  this  procedure  down  to  a  reasonable  cost  [ 6 J . 
Mallow's  C(p)  [7],  the  multiple  correlation  coefficient,  and  our 
knowledge  of  propulsion  engineering,  were  all  used  in  selecting  a  small 
subset  of  these  models  for  further  analysis. 

The  next  step  was  to  compute  full  sets  of  statistics  for  the 
candidate  models.  We  also  graphically  compared  the  residuals  from  each 
model  to  its  fitted  values  and  to  its  independent  variables;  partial 
regression  plots  were  used  to  provide  further  indications  of  the 
behavior  of  individual  data  points.  The  candidate  models  were  also 
examined  for  evidence  of  col  linearity  by  reviewing  the  condition  index 
of  the  data  matrix  and  the  decomposition  of  the  estimated  regression 
coefficient  variances  [8]. 

Furthermore ,  a  variety  of  influence  measures  were  used  in  our 
search  for  the  "best"  model,  especially  those  described  by  Belsley, 

Kuh,  and  Wo  1  sell  [8|  and  Cook  [9].  These  measures  indicate  the  effects 
of  deleting  an  observation  from  the  data  base.  In  general,  we 
considered  an  engine  influential  if  its  deletion  resulted  in  large 
changes  in  various  characteristics  of  the  model. 

The  Preferred  Equation 

The  preferred  equation  for  estimating  engine  development  costs 
through  MQT  is  displayed  both  numerically  and  graphically  in  Figure  1. 
The  relationship  shown  there  has  three  explanatory  variables,  all  of 
which  have  intuitive  appeal  as  well  as  statistical  significance  [10]. 
The  points  on  the  plot  represent  the  sixteen  engines  in  the  data  base. 
Costs  for  engines  above  the  45-degree  line  have  been  overestimated 
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Figure  1 

DEVELOPMENT  COST  (MQT) ,  PREFERRED  EQUATION 


while  costs  below  the  line  have  been  underestimated.  A  similar  plot 
was  used  with  other  models  in  order  to  detect  outliers  and  to  examine 
possible  trends  in  the  distribution  of  the  residuals. 

The  student  ixed  residual,  hat  diagonal,  covariance  ratio,  DFFITS, 
DFBF.TAS ,  and  Cook's  distance  for  each  data  point  used  in  the  model  are 
shown  in  Table  2.  Before  beginning  the  anaylsis  we  expected  the  F100, 
the  TF39 ,  and  the  J58  engines  to  be  flagged  by  these  statistics.  The 
F100  should  be  an  influential  observation  because  it  is  the  most 
technically  advanced  engine  in  the  sample,  having  the  highest  turbine 
inlet  temperature  and  thrust-to-weight  ratio.  The  TF39  should  also  be 
identified  as  an  influential  observation  because,  although  it  is  a 
subsonic  engine,  it  has  the  highest . thrust  rating  of  all  the  engines  in 
the  sample.  The  large  thrust  output  is  due  to  its  large  size  and  the 
fact  that  much  of  its  thrust  is  generated  in  a  mode  quite  different 
from  the  other  engines  in  the  sample.  Also,  it  is  the  only  large, 
transport  type  of  engine  in  the  data  base.  The  third  expected  outlier 
is  the  J58.  This  engine  is  the  only  engine  in  the  sample  designed  for 
a  high  altitude,  high  speed  reconnaissance  mission,  which  requires  a 
considerably  different  design  and  testing  approach. 

A  routine  analysis  of  the  residuals  only  identified  the  TF30  as 
being  an  outlier.  The  influence  measures  did  a  better  job  in  this 
regard.  For  many  of  the  diagnostics  the  F100  was  flagged  as  being  an 
important  data  point  in  influencing  the  regression  coefficients.  In 
addition,  several  of  the  di, agnostics  identified  the  TF39  as  being 
potentially  different  from  other  engines  in  the  sample.  The  J58  had 
the  highest  Cook's  distance  and  conspicuously  exceeded  the  extreme 
values  for  DFFITS  and  two  of  the  four  DFBF.TAS. 
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As  we  initially  expected,  the  F100,  TF39 ,  and  J58  proved  consistently 


to  be  the  most  influential  engines  in  this  model.  Because  future 
engines  are  expected  to  be  more  like  these  than  the  others  in  the 
data  set,  this  is  an  acceptable  model. 

A  large  number  of  other  models  for  estimating  engine  cost  were 
analyzed  before  the  final  set  of  preferred  equations  was  chosen.  The 
competing  models  each  had  at  least  one  of  the  following  drawbacks: 

-  candidate  variables  were  not  significant 

-  coefficients  had  counterintuitive  signs 

-  the  model  had  a  large  estimating  error 

-  independent  variables  exhibited  col  linearity 

-  influential  engines  were  unlike  expected  future  engines. 


Summary 

The  estimating  relationships  derived  in  this  project  provide 
improvements  in  engine  cost  estimating  capablility  over  the  DAPCA 
equations.  Major  strong  points  of  these  new  relationships  are 
intuitive  appeal,  ease  of  use,  fewer  independent  variables,  and  low 
estimating  error.  Also,  we  have  insight  into  the  influence  of  each 
engine  in  the  data  base  on  the  final  models.  Additional  engine 
development,  since  the  time  the  DAPCA  equations  were  derived,  have 
yielded  useful  data  that  have  been  added  to  the  data  base  so  that  it 
represents  a  wider  range  of  engine  characteristics. 

The  results  described  in  this  study  are  intended  for  estimating 
the  cost  of  large,  modern  aircraft  engines  in  the  context  of  long-range 
planning  studies.  Any  new  engine  to  be  estimated  must  be  consistent 
with  the  basic  assumptions  and  limited  data  on  which  the  CERs  were 


derived.  Specifically,  the  CERs  apply  to  development  and  pricing 
practices  similar  to  those  of  the  1960s  and  1970s.  The  models  also 
require  the  reasonable  assumption  that  basic  gas  *  ’rbine  design  in  the 
future  will  be  similar  to  that  of  today. 

Throughout  this  study  regression  diagnostics  were  used  as  tools  to 
supplement  our  understanding  of  the  data.  The  diagnostics  not  only 
identified  influential  data  points  that  would  have  otherwise  gone 
undetected,  but  guided  us  to  a  model  in  which  we  can  believe. 
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